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Abstract 

In this paper, we examine how patterns of scientific collaboration contribute to 
knowledge creation. Recent studies have shown that scientists can benefit from 
their position within collaborative networks by being able to receive more infor- 
mation of better quality in a timely fashion, and by presiding over communication 
between collaborators. Here we focus on the tendency of scientists to cluster into 
tightly-knit communities, and discuss the implications of this tendency for scien- 
tific performance. We begin by reviewing a new method for finding communities, 
and we then assess its benefits in terms of computation time and accuracy. While 
communities often serve as a taxonomic scheme to map knowledge domains, they 
also affect how successfully scientists engage in the creation of new knowledge. By 
drawing on the longstanding debate on the relative benefits of social cohesion and 
brokerage, we discuss the conditions that facilitate collaborations among scientists 
within or across communities. We show that successful scientific production occurs 
within communities when scientists have cohesive collaborations with others from 
the same knowledge domain, and across communities when scientists intermediate 
among otherwise disconnected collaborators from different knowledge domains. We 
also discuss the implications of communities for information diffusion, and show 
how traditional epidemiological approaches need to be refined to take knowledge 
heterogeneity into account and preserve the system's ability to promote creative 
processes of novel recombinations of ideas. 
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1 Introduction 



The recent development of online libraries and efficient search engines allows 
us to have a quantitative description of a number of scientific collaboration 
networks based on a large amount of scientific papers, with precise details 
about the identity of the authors, the subject of the papers (keyword analy- 
sis) as well as the relations between these papers (citations). This development 
offers exciting new perspectives and opportunities for understanding how the 
process of scientific production is organized and evolves over time. This re- 
quires not only the mapping of the intellectual contributions and the scientists 
that make them, but also the study of how information flows among scien- 
tists and how they interact with one another. Electronic databases enable us 
precisely to trace the way scientists exchange, discover and create new infor- 
mation over time, which may help uncover the conditions and mechanisms 
underpinning successful transfer and sharing of knowledge, scientific produc- 
tivity and creativity, such as the development of new areas of investigations 
and research topics. One way to study how scientists exchange an d share in- 



form ation is through the construction of co-authorship networks (iNewmanl . 
20031 ). When analyzing these networks, one may reasonably assume that the 



authors collaborating on a paper know each other (at least in relatively small 
collaborations) and have put their expertise in common in order to carry out 
joint research and co-write th e paper. Similarly, citation analysis (jGarfield, 
I972I . I1995I : iLeydesdorfj . Il998l ) is a tool for evaluating how the ideas and con- 
cepts of a paper are used in subsequent works, leading to cascades of influence. 
In both co-authorship and citation networks, scientific collaboration is typi- 
cally described in terms of a very large network, usually composed of tens of 
thousands of nodes, thereby lending itself to statistical description and mo- 
tivating an analysis that combines the social sciences with complex network 
theory. 



Some of the statistical quantities typically used to describe these networks 
are purely local and may be employed in order to give a measure of the 
quality of a paper depending on its topological properties. The best-known 
example is certainly the in-degree of a paper, which is the number of its ci- 
tations, and represents a standard way for quantifying its impact (jGarfield . 



I972I : IWuchty et al.l . 120071 ). The corresponding global description is the degree 



distribution, wh ich is well-known to h ave a long tail for a wide range of dif- 
ferent networks ( Barabasi et al. . 2000l ). For example, th is tail is well fi tted by 
a power-law function in the case o f cita tion networks (iRednerl . |2005| ) and of 
co-authorship networks (INewmaru . 1200 ll ). Other local measures of the topol- 
ogy of the networks include the clustering coefficient, correlations between the 
degrees of adjacent nodes, etc. 



The previous quantities give information about the local properties of the 
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network around nodes. However, they do not help uncover the highly clus- 
tered nature of scientific production, namely the fact that co-authorship net- 
works and citation networks are made of several dense groups of nodes, also 
called communities, such that there are many links between nodes of the 
same community and only few links between nodes of different communities 
( IGirvan fc Newmanl . |2002| ). Such a modular structure is often associated with 
the high specialization needed to perform research, and with the emerge nce 
of disciplines, their own jargon, interests and techniques (IWhitleyl . |2000| ). A 
thorough understanding of this modular structure is important as it helps 
uncover the organization of scientific production. In this paper, we will first 
take a structural viewpoint and discuss how a collaboration network can be 
partitioned into communities by looking at the ties connecting two or more sci- 
entists when they co-write a paper. While these communities represent groups 
of nodes connected through dense overlapping ties, they may also suggest a 
possible organization of the network into clusters of nodes that are homo- 
geneous with respect to some non-relational attribute. In particular, when 
communities of connected scientists also represent the set of individuals work- 
ing in the same scientific disci plines, they rnay be used as a taxonomic s cheme 
to rnap knowledge domains ( Borner et al.l . l2003l : iBoyack et al.l . l2005l : I Chen 



2003 : Leydesdorff &: Rafols , forthcomind ) and to track the changing frontiers 



of science. 



The partitioning of scientific collaboration networks into communities that 
overlap with the organization of the network into distinct disciplines or re- 
search areas has important implications in terms of the performance of the 
scientists working within or across communities. Research in the social sci- 
ence has long been concerned with this issue, and has been marked by a 
sharp debate between two apparently opposed views. One view st resses the 
benefits of "closed", dense, or cohesive networks for performance (jColemaru . 
19881 ). while the other emphasizes the value deriyed from "open", sp a rse, o r 



brokered networks, rich in structural holes (jBurtl . Il992l : iGranovetterl . Il973l ). 
We build on, and extend, this debate on the trade-off between social cohesion 
and brokerage by investigating the conditions under which scientists can un- 
dertake successful work by collaborating with others within or outside their 
own communities. Moreover, the partitioning of the network into communi- 
ties may have important implications in terms of information diffusion, es- 
pecially as a result of the sporadic interactions between nodes in different 
communities. A related well-known example is that of the synchronization 
of oscillators on a modular network, in which synchronization takes place 
very fast with i n modules, but at a much slowe r time scale at the global level 



( 1 Arenas et al.l . l2006l : iBarahona fc Pecoral . l2002l ). The presence of communities 



is also kno wn to have a profou nd impact on the emergence and survival of co- 



operation (ILozano et al.l . 120081). and on the possibi lity for heterogeneous ideas 



to co-exist in the system (iLambiotte et al.l . I2007al ). 
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The goal of this paper is to study the role of communities in knowledge creation 
from different angles. We will first focus in section 2 on the methods that 
have been developed in order to uncover communities in large networks, and 
propose a method that allows us to study networks of unprecedented size. This 
section will take a structural perspective and will be dedicated to a description 
of the network topology. In section 3, we will extend our structural analysis 
of communities by discussing whether and the extent to which they facilitate 
scientific production. In section 4, we will focus on the diffusion of information 
and examine how communities affect the creation and spreading of new ideas. 
The last section is dedicated to a discussion and summary of our main findings. 



2 Community Detection 



The way we access, use and analyze scientific knowledge has radically changed 
in the last few years due to the availability of a large amount of research 
databases on the Web, providing us with accurate and complete informa- 
tion about the content of scientific papers, their authors and their relations. 
As more information on scientific production continues to grow, new tools 
are needed in order to extract and organize knowledge, in the same way as 
Google helps us to find our way on the Internet. There are several, often com- 
plementary, areas of investigation that require suitable methods of analysis. 
Among these areas are, for instance, t he identification of major researchers 



or keystone articles (jChen et al.l . 120071 ). the discove ry of new articles based 



on readers' previous search behavior and interests (iKautz et al.l . Il997l ). and 
the analysis of emerging trends and the relations between different disciplines. 
In general, the aim of all these areas of study is to offer readable maps of 
knowledge domains. 

There are several ways to investigate the organization of scientific production. 
This can be done at the level of the papers themselves, by imposing a classi- 
fication scheme, such as the PACS classifications in the physics literature, or 
by organizing database s in terms of the semantic similarity of their contents 



( iLandauer et al.l . |200J). Another approach consists in representing scientific 
production in terms of a complex network, where different kinds of nodes (au- 
thors and articles) and different kinds of links (who writes with who, what 
cites what) are present. This method has the advantage of being flexible, as it 
does not require a centralized organization into PACS classifications, thereby 
allowing for tracking the self-organization of science and the emergence of 
fields before a new specific journal has been created or before it has been rec- 
ognized as a new category. This flexibility has a cost, however, as such network 
representations are still very complex, and require careful analysis in order to 
decompose the multitude of nodes and links into meaningful modules, and 
highlight the underlying structures and the relationships between them. 
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Fig. 1. Some of the partitions of a simple network made of 6 nodes and 9 links. The 
partition with the highest modularity Q divides the system into 2 communities. 
In this case, the problem of finding the best partition is trivial, due to symmetry 
reasons, but it is much more complicated when the system is larger and less regular. 



This problem is not specific to the mapping of knowledge domains as it 
occurs for almost any complex system that can be represented as a net- 
work, e.g., friendship networks, metabolic networks, and food- web networks 
( iGirvan fc Newmanl . 120021 : iNewmanl . |2003| ) . In general, a way to extract infor- 
mation from these very complicated mu lti-dimensional systems consi sts in un- 
covering their "community structure" (iRoswall fc Bergstroml . l2008l ). namely 
in dividing the network into groups such as most of the links are concentrated 
within the groups, while there are only few links between nodes of different 
groups. In other words, this approach consists in finding a meaningful parti- 
tion of the network into communities or sub-units. This partition may then 
be used in order to produce a coarse-grained description of the full network, 
by assuming that the nodes belonging to the same community are equivalent, 
and by considering a higher-level meta-network where the nodes are now the 
communities. The resulting meta-network whose nodes are the communities 
may then be used to visualize the original network structure. The identifica- 
tion of these communities is therefore of crucial importance, especially because 
they may overlap with (often unknown) functional modules such as topics in 
information networks, disciplines in citation networks, or cyber-communities 
in online social networks. 



In the last few years, there has been a concerted interdisciplinary effort to de- 
velop mathematical tool s and computer algorithms to detect community struc 



ture in large networks (INewmaru . l2004l : iNewman fc Girvanl . l2004l : [Newman 



20061 ). Such a problem is often computationally intractable and therefore re- 
quires approximation methods in order to find reasonably good partitions in 
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a reasonably fast way. The rapidity of the algorithm has become a crucial 
factor due to the increasing size of the networks to be analyzed. A large 
variety of methods have been developed in order to address this problem 
( jFortunato fc Castelland . 120091 ). In this paper, we will focus on a type of ap- 
proach which has proven particularly effective and which is based on the opti- 
miza tion of a quality function, the so-called modularitjQ ( iNewman &: Girvan . 
20041 ). The modularity of a partition is a scalar value between -1 and 1 that 
measures the density of links inside communities as compared to links between 
communities (see Figure 1). T he exact optimization of modularity is a problem 



that is computationally hard (IBrandes et al.l . |2006| ). A number of algorithms 



have been recently introduced in order to deal with this problem. For a com- 
parison of the accuracy an d computational co st of different methods, we refer 
to the excellent review by iDanon et al.l (120051 ). T he first method proposed t o 
optimize modularity was the divisive algorithm of I Girvan fc Newman! (120021 ). 
However, this method is v ery slow and has been outperformed by more recent 
methods (INewmanl . 120061 ). The best method in terms of accuracy is certainly 
Simulated Anne aling; however, its app licability is limited to systems of rela- 



tively small size (iGuimera et al.l. 120041). Up to rece ntly, the fastest algorithms 



were the greedy algorithm of IClauset et al.l (120041 ) and its generalization by 
Wakita &: Tsurumil (120071 ) . which allowed researchers to analyze systems in- 
cluding up to a few million nodes. 

In this paper, we will use a method which was introduced very recently and 
which outperforms previo us methods in terms o f computation time, while hav- 
ing an excellent accuracy (IBlondel et al.l . 120081 ) . This method tak es advantage 
of the self-similar nature of complex networks (jSong et al.l . l2005l ). namely the 
fact that many networks observed in the real world are composed of sev- 
eral natural levels of organization, i.e., the networks are organized int o com- 



munities that div i de th emselves into sub-communities ([Arenas et al.l . 12008 



Sales-Pardo et al 



20071 ). This Multi-Level Aggregation Method (that we call 
since now on) incorporates such a mult i- level organization 

First, the algo- 



"Louvain method' 

and consists of two phases that are repeated iterativeljlf. 
rithm looks for "small" communities by optimizing modularity in a greedy, 
local way. Second, the algorithm aggregates nodes of the same community 



The modularity of the partition of a network is given by Q = 2^ j 



A, 



6{ci,Cj), where the sum is performed over all pairs of nodes belonging to the 

same community, m is the total number of links, ki the degree of node i and A 
is the adjacency matrix of the network. From a physics perspective, modularity 
can be inter preted as the Hamiltonian of a q-Potts model with nearest neighbours 
interactions ( Reichardt &: Bornholdt . 20051 ). 

^ For a detailed descript ion of the Lo u yain m ethod and its properties, we refer 
to the original paper by iBlondel et al. (|2008l ^. C++ and matlab versions of the 



program are freely available at ^http:// findcommunities.googlepages.com 
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and builds a new network whose nodes are the communities. These phases 
are repeated iteratively until a maximum of modularity is attained and an 
optimal partition of the network into communities is found. The choice of this 
community detection method is motivated by its excellent accuracy and its 
ra pidity which a l lows t o uncover networks of unprecedented size (for example, 



m 



Blondel et al.l (120081 ) . a network of more than 100 million nodes is analyzed 
in around 2 hours). The rapidity of the algorithm therefore opens exciting 
opportunities, as it allows us to analyze networks made of millions of nodes, 
and therefore to study whole datasets, instead of dividing them into sub-parts 
due to hmitations of computation time. This rapidity also enables us to study 
the evolution of large networks (and therefore the birth, death, merging, etc. 
of co mmunities), by foc using on several snapshots taken at different points in 



time flPalla et all . 120071 1 



We now apply this algorithm to the co-authorship network of the scientists 
that posted preprints on the Condensed Matter E-Print Archive. To construct 
the network, we have included all preprints posted between Jan 1, 1995 and 
M arch 31, 2 05. T his network, whose statistical properties have been described 
in iNewmanI (120011 ) . exhibits typical features of social networks, such as a high 



clustering coefficient and a fat-tailed degree distribution. It is composed of N= 
40421 scientists and of L= 175693 links. The Louvain method finds a partition 
of modularity Q = 0.729 (made of 1032 c ommunities) in less t han 1 second. 
For the sake of comparison, the method of IClauset et al.l (120041 ) finds a worse 
partition of modularity Q = 0.654 in more than 4 minutes. It is also interest- 
ing to note that the difference in accuracy and in computation time decreases 
for a random network where the links between the nodes have been randomly 
redistributed. In this case, it takes 8 seconds to the Louvain method to find a 
modularity of Q = 0.283 (as expected, this value of mod ularity is smaller than 
in the case of the original network), while the method by lClauset et al.l (120041 ) 
finds a modularity of Q = 0.277 in 80 seconds. The fact that the Louvain 
method is slower for a random network arises from the absence of internal 
structure in the random network, which makes the multi-level approach less 
efficient. It is interesting to note, however, that also in the case of a random 
network the Louvain method is still more rapid and accurate than the alter- 
native method. One should not e that the Louvain m ethod has been recently 
applied to co-citation networks (j Wallace et al.l . |2008| ) where it was shown that 
the uncovered communities correspond to coherent groups of research and are 
indeed representative of the structure of a given scientific discipline. 



The visualization of the co-authorship network by using standard programs 
such as Visone or Pajek would not be very helpful, as the network would 
resemble a cloud with too many links and nodes to be distinguished. By con- 
trast, by agglomerating nodes into communities, with an obvious reduction of 
the size of the system (from 40000 nodes to 1000 communities in the above 
example), and by highlighting the relations between these groups of nodes. 
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Fig. 2. By optimising modularity, one uncovers sets of topologically equivalent nodes 
and the relations between them, thereby allowing to represent the network in a 
coarse-grained manner. 

the community detection method makes such a visuahzation possible. Let us 
illustrate this coarse-graining process by focusing on a smaller collaboration 
network of scientists wo rking on networ k theory and experiment, which has 
been studied in detail in iNewmanI (120061 ). This network is made of 1589 scien- 
tists, 379 of which belong to the largest component. As shown in figure 2, the 
Louvain method partitions this largest component into 10 communities and 
allows to clarify the network representation. 



3 Social Structure and Knowledge Creation 



In the previous section, we have taken a structural perspective, and have shown 
how a scientific co-authorship network can be portioned into communities of 
tightly-knit scientists by looking at the links among scientists. In this section, 
we will explore the implications of network structure and its partition into 
communities in terms of the performance of the scientists. More generally, by 
having an impact on the degree to which nodes are exposed to the information 
flowin g in a network, structure affec ts how successfully nodes undertake their 
tasks (jSmith-Doerr fc Powell! . l2005l ). Among these tasks, we will concentrate 
on knowledge creation and scientiflc production, that here we broadly deflne to 
include all creative intuitions and combinatorial processes leading to scientiflc 
and technological advances through novel rearrangements of ideas, t heorie s, 
methods, processes, strategies, and so on (IBurtl . l2004l : iFleming et al.l . 120071 ). 



The network foundati ons of knowl e dge creation hav e long been documented 
in the social sciences (lAUenl . 119771 : iTushmanl . Il978l ). Recent empirical stud- 
ies have uncovered the positive effects of multi-authorship on research perfor- 
mance, suggesting th at teams typica l ly pro duce more frequently cited research 
than individuals do (IWuchty et al.l . 120071 ). Moreover, researchers have been 
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concerned with the mechanisms that underpin the influence of collaborative 
structures on human creativity not only within the domain of scie ntific endeav- 
ors, b ut also in the context of artistic production. For example, lUzzi fc Spiro 
( I2OO5I ) focused on a network of creative artists who made Broadway musicals, 
and found a nonlinear association between the "small-world" propertied of 
the collaborative network and the production of financially and artistically 
successful shows. In particular, they showed that when the clustering coeffi- 
cient ratio is low or high, the financial and artistic success of the shows is low, 
while an intermediate level of clustering is associated with successful shows. 



While uncovering the "small-world" network effects on creativity, these results 
help shed light on a fundamental network mechanism that has long been inves- 
tigated in the social sciences: social cohesion. Building on Coleman's (1988) 
conception of social capital, scholars have studied the benefits of cohesive 
social structures organized into well-defined tightly knit communities of con- 
nected individuals. In particular, the tendency of individuals to forge links 
locally within groups has often been associated with an increase in one's so- 
cial capital, in that it engenders a sense of belonging, fosters trust, facilitates 
the enforcement of so c ial norms, and enables the creation of a common culture 
flReagans fc McEvilvl . I2OO3I : lUzzi Il997l : lUzzi fc Spir3 . l2005h . For example, if 
individual A has links with individuals B and C, a link between B and C would 
enable the three individuals to detect and punish one another's undesirable 
behavior more easily, increasing the expected costs of opportunistic behavior 
with respect to the case in which a link between B and C is absent. Mutual 
monitoring abilities will in turn engender trust among connected individuals 
and sustain the generation of group norms more easily and to a greater extent 
than would be the case if individuals did not have dense and overlapping links 
with one another. 



By fostering trust and promoting the enforcement of social norms, social cohe- 
sion that occurs within communities offers the facilitating conditions for coor- 
dination and collaborative endeavors. For example, an abundance of empirical 
evidence supports the idea that links embedded in social relationships reduce 
competition and increase the motivation to transfer information. If people 
who trust one another are more likely to exchange information, cohesion will 
then enhance information sharing. Individuals in cohesive communities will 
be able to obtain information in a timely fashion, and will al so benefi t from 



the exchan ge of complex, tacit and proprietary information ( iHansenl . Il999 



Uzzil . I1997I ). More complete information that can be obtained more easily will 



in turn facilitate innovation and knowledge creation (lAhujal . l2000l : lObstfeld 



^ "Small-world" networks are built from a regular lattice where a fraction of the 
links is replaced by random links. By changing this fraction, one interpo lates be- 



tween a regular lattice and a random network ( Watts &: Strogatz . 19981 ). Such 
model exhibits a high density of triangles as well as a small diameter. 
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20051 ). Moreover, by engendering a supportive social context, trust sustains 
risk-taki ng and learning , with further pos i tive e ffects on scientific creative en- 



deavors (lAmabile et al.l . l2000l : lEdmonsoru . Il999l ) . 



Despite all the benefits associated with social cohesion, nonetheless the ten- 
dency of individuals to cluster into tightly knit communities also bears a cost: 
local redundancy. From a dynamic perspective, the more an individual's addi- 
tional contacts are already connected to the individual's current ones, the less 
likely they are to take the individual closer to new people he or she does not 
know already. Lack of connections with new social circles may create isolation 
and eventua l ly deg rade perfo r manc e. Building on the seminal arguments of 
Granovetterl (Il973l ) and iBurtI (119921 ) . proponents of the benefits of brokerage 
point out that in cohesive networks organized into communities links tend to 
be strong as people invest a disproportionately large amount of their time 
and resources in relationships with few others. Cohesive networks thus make 
links with dissimilar others and exposure to new information less likely. By 
contrast, in networks that are rich in structural holes, where individuals bro- 
ker between otherwise disconnected contacts, links tend to be weak and more 



likely to connect people with different ideas, interests and perspectives ( iBurt 
20041 ). If scientific production requires prompt access to novel information 



then people embedded in brokered structures will be more creative and suc- 
cessful in their endeavors. From this perspective, brokers between communities 
occupy the most advantageous boundary position as they lie at the confluence 
of fresh and het erogen e ous ideas that the y can creatively integrate into novel 
recombinations (IBrassl . 1 19951 : iBurtl . 12004 ) . 



While scholars in the social sciences agree on the importance of social struc- 
ture for information diffusion and performance, there is still controversy over 
the optimal structure and, more specifically, over the relative benefits of social 
cohesion within communities on the one hand, and brokerage between commu- 
nities on the other. A number of empirical studies have suggested that an ap- 
propriate combination of density and sparseness can provide individuals with 
the necessary redundant trusted relationships and access to non-redundant 
external contacts that will enable t hem to successfully perform their tasks 



( iBurtl . I2OO5I : iPodolny fc Baronl . 119971 ) . A more recent line of investigation has 



examined the apparent trade-off between social cohesion and brokerage by 
focusing on the interactions between network structure and the attributes of 



the interacting individu als (jPerry-Smithl. l2006l: iReagans &: Zuckermanl . 12006 



Rodan fc Galunid . l2004l ). For example, [Fleming et al.l (120071 ) have empirically 
examined the mitigating effects exerted by individuals' attributes on the ben- 
efits associated with brokerage. Their study suggests that, while brokerage be- 
tween otherwise disconnected collaborators makes all individuals more likely 
to create new ideas, at the same there are marginal contingent positive effects 
of social cohesion on generative creativity when individuals and their collab- 
orators bring broad experience, have worked for multiple organizations, and 
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have connections with external contacts. 



A combined study of network structure and nodes' attributes becomes es- 
pecially relevant in the context of knowledge creation, scientific production 
and innovation, where the benefits of social relationships crucially depend on 
the information scientists already possess as well as on the heterogeneity and 
breadth of the information they can obtain from their contacts. More generally, 
there is little consensus on the effects of access to heterogeneous knowledge on 
performance. On the one hand, recent work has examined the benefits that 
scientists can gain from specializing, in terms of research prod uctivity, promo- 
tion, tenure standards and academic earnings (ILeaheyl . 120071 ). On the other, 
there is also evidence that access to novel heterogeneous information isben- 
eficial for cre ativity and innova t ion (jBurtl . 12004 iHargadon fc Suttonl . 119971 ) . 
For example, iRodan &: Galunid (120041 ) have found that the variety of knowl- 
edge to which managers are exposed positively affects not only their overall 
performance, but also their ability to accomplish complex tasks, create and 
implement novel ideas. 



With only few exceptions (iGuimera et al.l . l2005l ). and despite the importance 



of knowledge heterogeneity and inter-disciplinarity for scientific production, 
scanty attention has been devoted to the way collaborative structures combine 
with scientists' degree of specialization and access to pools of diverse knowl- 
edge to affect their research performance. Scientists can vary the breadth of 
access to knowledge by carefully building their networks and selecting their 
collaborators either within their own specialty area or in different areas that 
enable them to obtain knowledge without having to acquire it personally. 
On the one hand, scientists can reduce access to heterogeneous knowledge 
by selecting their collaborators within their own specialty area. In so doing, 
they enhance scientific consensus, and facilitate scient ific production through 
the generation of shared norms of research practice (iMoodyl . |200J). On the 
other, scientists can expand access to heterogeneous knowledge by engaging 
in collaborations with other scientists from different specialty areas. While 
scientists typically rely on their coll aborators to obtain the k nowledge and 
expertise they do not have already (ILaband fc ToUisonl . |2000| ). research has 
largely overlooked the various collaboration patterns through which scientists 
control their access to heterogeneous knowledge pools, and how these patterns 
ultimately affect knowledge creation and research performance. 



Recent empirical work has investigated the extent to which the interplay be- 
tween knowledge heterogeneity and the structure of the colla boration network 
affect s a scientist's ability to produce research of high impact (jPanzarasa fc Opsahl 
20071 ). Drawing on the collaboration network of the social scientists that au- 
thored or coauthored the publications submitted to the 2001 Research Assess- 
ment Exercise in business and management in the UK, this work shows that 
scientists bridging two otherwise disconnected contacts with heterogeneous 
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knowledge have a better performance than scientists with no such brokerage 
opportunities. At the same time, highly cited scientists also tend to be so- 
cially embedded within communities in which knowledge is homogeneously 
distributed across members. In this case, when scientists and their collabo- 
rators are not diverse in knowledge, collaborations are beneficial when they 
occur with contacts that are already collaborating themselves. 



This work adds a new dimension to the relevance of communities for knowledge 
creation and scientific performance, and more generally to the debate on the 
relative benefits of social cohesion and brokerage. On the one hand, when sci- 
entists seek collaborators within their own knowledge pool, they can enhance 
their research performance by generating structurally cohesive communities. 
Thus, wh ile communiti e s often serve as a taxonomic scheme to map knowledge 



domains (IBorner et al.l . l2003l : iBoyack et al.l . l2005l : ICheru . |2003| ) , they also offer 



the supportive structural conditions for the successful performance of collabo- 
rative scientific work that remains confined within the boundaries of a knowl- 
edge domain. On the other, when collaborative scientific work spans across 
knowledge domains, scientific performance increases when scientists interme- 
diate between their collaborators. Bridging structural holes between otherwise 
disconnected knowledge pools creates linkages across distinct scientific com- 
munities that offer knowledge brokerage opportunities for novel recombination 
of ideas. 



4 Information diffusion and knowledge heterogeneity 



From a modeling poin t of view, innovation and knowledge creation can be seen 



as a catalytic process ([Bruckner &: Scharnhorstl . Il986l . Il990l : iHanel et al.l . 12005 



Lambiotte et al.l . l2007bl ). The juxtaposition of ideas in the mind of an indi- 



vidual may lead to syntheses and to the emergence of new ideas that can then 
diffuse and reach other individuals and cascade through the social network. 
This propagation may in turn result in further syntheses and in the emer- 
gence of other new ideas which are then diffused and so on, thereby leading to 
a sequence of self-reproducing flows of new ideas. In principle, a good model 
for innovation and knowledge creation should therefore incorporate these two 
types of i ngredients: synthes i s and diffusion. Diffusion has been studied ex- 



tensively (Bettencourt et al.l . l2006l : iGoffman &: Newilll . 1 1964 iGoffmanl . Il966 



Rogersl . l2003l ). especially because of its parallel with the dynamics of an epi- 
demic. Like a disease and its propagation, a new idea typically spreads among 
people that communicate directly (e.g., by talking, or via telephone and e- 
mail) or indirectly (e.g., by reading the same journals) 0- This parallel has 



^ In this paper, we focus on models where a process diffuses on a static network. 
This limitation can be overcome by looking at the co-evolution of diffusion and of 
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motivated the modeling of the evolution of scientific fields as epidemiologi 
cal contact processes such as the Susceptible- Infecte d- Recovered (SIR) model 
or the discrete-time Independent Cascade Model^ ( Goldenberg et al. . 2001 
Kempe et alll2003h . 



Mathematical epidemiologists have long emphasized the important role played 
by the network t opology in det e rmini ng properties of disease invasion, spread 
and persistence ( iMay fc Lloydl . l200ll ). Several general results have been de- 
rived, such as the fact that epidemics spread without a threshold on a scale-free 
network due to the presence of hub£3, which ac celerate the diffusion by reach- 
ing a n unusually high proportion of other nodes (jPastor-Satorras fc Vespignanil . 
200ll ). Another important result is that diffusion is more efficient in random 
networks than in clustered networks, and that the presence of random links is 
funda mental for promoting diffusion ( Huang fc Li . 200?! : Vazquez fc Moreno . 
20031). This result, which supports the Granovetter's (1973) hypothesis of the 
strength of weak ties, is due to the fact that random links minimize the accu- 
mulation of several contacts around the same nodes, thereby reducing redun- 
dant links and accelerating diffusion across different parts of the network. 

This result, however, needs to be critically re-assessed in the light of our pre- 
vious discussion about social cohesion. In section III, we noted that cohesive 
structures are likely to foster trust and facilitate knowledge transfer and shar- 
ing. Unlike the above mentioned results on disease spread, our analysis thus 
suggested that dense networks clustered into communities may accelerate, 
at the very least locally, information diffusion. This observation, therefore, 
cautions against a direct application of epidemiological models to a knowl- 
edge diffusion context. In order to explore this issue, researchers have recently 
modified the above models of disease spr ead in order to preserve and enhance 
the role of social cohesion (jWattsl . |2002| ). For instance, threshold models are 
based on the fa ct that infect i on requires simultaneous e xposure to multiple ac- 
tive neighbors (IGranovetteii . 1 19781 : iKempe et al.l . |2003| ) . Similarly, generalized 
cascade models are based on the fact that the probability for a node to get 
"activated" depends on the number of ti mes it has been in contact with an idea 
flDodds fc Wattsl . l2004l : rKleinberd . l2007h . Within the context of "small- world" 
networks, research has shown that different types of links (random vs. regular ) 
play very different roles in the propagation of ideas (Ide Kerchove et al.l . |2008| ) . 



20081 : IVazqiiez et aP . |20oi) • 



network dynamics (jKoening et al. 

^ In the Independent Cascade Model, one starts from an initial set of infected 
nodes. When a new node becomes infected, it tries one single time to infect each of 
its neighbors with independent probability p. The process stops when no new node 
has been infected and is available to continue the propagation. 
^ In the context of the diffusion of innovations, the importance of the heterogeneity 
of the agents is well-known and usually taken into account by ca tegorizing them 
into categories, e.g. innovators, early a dopters, etc. ( Rogers! . 2003 ) or introducing 
opinion leaders (jValente David . Il999l ^. 
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Random links, which are short-cuts in the network connecting otherwise dis- 
tant regions, play an integrative role by connecting different communities, and 
therefore enable nodes to be exposed to, and explore, different parts of the 
network. Regular links, i.e. links connecting neighbouring nodes, by contrast, 
connect nodes within communities, and are likely to increase the number of in- 
fected paths available to each node. More interestingly, it was also shown that, 
when redundancy is needed to secure infection and adoption of a new idea, the 
presence of random links may actually hinder the emergence of cascades, and 
that the size of the avalanche depends in a non-triv i al way on the modular 



struc t ure and on the model par ameters (ICentola et al.l . 120071 : ICentola &: Macy 



20071 : Ide Kerchove et all . [20081). 



This epidemiological approach typically focuses on the diffusion of one idea. 
Starting from one "infected" individual, researchers are interested in the way 
an idea propagates among acquaintances in the social network, and try to 
estimate the total number of "infected" individuals. This approach, however, 
neglects the catalytic nature of knowledge creation, namely the fact that sev- 
eral ideas diffuse in the system and at the same time may be creatively re- 
combined to produce new ideas. More specifically, what is often ignored is the 
role played by heterogeneity between ideas, a property that most epidemio- 
logical models forget to take into account by simply assuming homogeneity 
throughout the system. It is interesting to note that the catalytic nature of 
knowledge creation calls for a critical reassessment of the network implications 
for information diffusion. On the one hand, a rapid diffusion of ideas is crucial 
as it facilitates knowledge creation by increasing the ideas that individuals can 
obtain and recombine. On the other, however, if ideas reach too many people 
too quickly, they might generate consensus and lead to convergence toward a 
popular, though smaller, set of shared idea s, thereby hindering the capacity of 



innovation of the system (IFang et al.l . 120071 ). In this sense, the presence of mod- 



ules, or niches, is necessary in order to ensure the co-existence of several ideas 
and preserve the fundamental diversity of knowledge conducive toward the 
production of further new knowledge. This observation has found support, for 
instance, in the context of opinion dynamics, where the frag ility of consensus 



under variations of the network topology was highlighted (ILambiotte et al. 



2007af ). 



5 Conclusions and discussion 



In this paper, we have focused on the role played by communities in knowl- 
edge creation. By integrating approaches from graph theory, economics, so- 
ciology and physics, we highlighted the relations between network structure, 
performance, and information diffusion, in the specific context of scientific 
production. In section II, we introduced the concept of community at the 
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network level, by focusing on links between authors, regardless of the nodes' 
attributes. It was argued that the uncovering of communities is necessary in 
order to highlight relations between elements, reduce the dimension of the 
system and provide useful maps of knowledge. In section III, we explained 
how and to what extent communities can be advantageous for scientific per- 
formance. Since scientists can benefit from cohesive collaborations when their 



collab orators belong to the same knowledge domain (jPanzarasa &: Opsahl 



20071 ). communities at the network level will support scientific performance 
when they reflect unique non-overlapping knowledge domains. In this case, 
successful science production will therefore be organized into a disproportion- 
ately large number of cohesive collaborations among scientists with homo- 
geneous knowledge within the same community, and relatively few brokered 
collaborations among scientists with heterogeneous knowledge across differ- 
ent communities. The interplay betwen communities and knowledge creation 
was then discussed from a modelling point of view in section IV, where we 
showed that the creation and diffusion of knowledge may be driven by dif- 
ferent network mechanisms. On the one hand, random links facilitate rapid 
communication of ideas within the network. On the other, when redundancy is 
needed for individuals to adopt a new idea, the presence of local structure and 
communities not only accelerates diffusion due to the presence of redundant 
cohesive relationships, but also promotes diversity of knowledge across com- 
munities, thereby supporting the capacity of the system to innovate through 
creative recombinations of different ideas. 
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